Huo Z, Gu B, Huang H. Training neural networks using features replay[C]//Advances in Neural Information Processing Systems. 2018: 6659-6668.

1. Overview

1.1. Motivation

the computational time of the backward pass is about twice of the computational time of the forward pass
existing methods removing backward locking work poorly when DNN is deep

In this paper

propose a novel parallel-objective formulation for the objective function of the neural network
introduce features reply algorithm
prove it is guaranteed to converge
experiments to demonstrate the proposed method achieves faster convergence, lower memory consumption and better generalisation error

module k sends to the gradient below module k-1 for the computation of the next iteration

divide ResNet164 and ResNet101 into 4 modules
- σ > 0 all the time. Assumption 1 is satisfied such that Algorithm 1 is guaranteed to converge to the critical points of for the non-convex problem
- σ of lower modules are small at the first half epochs. The variation of σ indicates the difference between the descent direction of FR and the steepest descent direction
small σ at early epochs. help method escape from saddle points and find better local minimum
large σ at final epochs. prevent method from diverging